Voice Activity Detection Using Speech Recognizer Feedback

نویسندگان

  • Kit Thambiratnam
  • Weiwu Zhu
  • Frank Seide
چکیده

This paper demonstrates how feedback from a speech recognizer can be leveraged to improve Voice Activity Detection (VAD) for online speech recognition. First, reliably transcribed segments of audio are fed back by the recognizer as supervision for VAD model adaptation. This allows the much stronger LVCSR acoustic models to be harnessed without adding computation. Second, when to make a VAD decision is dictated by the recognizer not the VAD module, allowing an implicit dynamic look-ahead for VAD. This improves robustness but can be gracefully reduced to meet latency requirements if necessary without requiring retraining/retuning of the VAD module. Experiments on telephone conversations yielded a 6.7% abs. reduction in frame classification error rate when feedback was applied to HMM-based VAD and a 4.2% abs. reduction over the best baseline system. Furthermore, a 3.0% abs. WER reduction was achieved over the best baseline in speech recognition experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

A robust audio-visual speech recognition using audio-visual voice activity detection

This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence ...

متن کامل

Speech Spotter: On-demand Speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations / Masataka Goto

This paper describes a novel speech-interface function, called “speech spotter”, which enables a user to enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to anothe...

متن کامل

A Mandarin Voice Organizer Based on a Template-Matching Speech Recognizer

On the observation of current available voice organizers, all of them accept only voice commands or word-based commands. Using natural spoken language to operate organizer is still a difficult problem. In this paper, a template-based speech recognizer which accepts near(constrained) spoken language is proposed. Since the template-based recognizer is a domain-dependent speech recognition system,...

متن کامل

Robust Speech Recognition in a Car Using a Microphone Array

Performance of automatic speech recognition relies on a vast amount of training speech data mostly recorded with little or no background noise. The performance degrades significantly with existence of background noise, which increases type mismatch between train and test environments. Speech enhancement techniques can reduce the amount of type mismatch by extracting reliable speech features fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012